The NagY regulator: A member of the BglG/SacY antiterminator family conserved in Enterococcus faecalis and involved in virulence

Enterococcus faecalis is a commensal bacterium of the gastrointestinal tract but also a major nosocomial pathogen. This bacterium uses regulators like BglG/SacY family of transcriptional antiterminators to adapt its metabolism during host colonization. In this report, we investigated the role of the BglG/SacY family antiterminator NagY in the regulation of the nagY-nagE operon in presence of N-acetylglucosamine, with nagE encoding a transporter of this carbohydrate, as well as the expression of the virulence factor HylA. We showed that this last protein is involved in biofilm formation and glycosaminoglycans degradation that are important features in bacterial infection, confirmed in the Galleria mellonella model. In order to elucidate the evolution of these actors, we performed phylogenomic analyses on E. faecalis and Enterococcaceae genomes, identified orthologous sequences of NagY, NagE, and HylA, and we report their taxonomic distribution. The study of the conservation of the upstream region of nagY and hylA genes showed that the molecular mechanism of NagY regulation involves ribonucleic antiterminator sequence overlapping a rho-independent terminator, suggesting a regulation conforming to the canonical model of BglG/SacY family antiterminators. In the perspective of opportunism understanding, we offer new insights into the mechanism of host sensing thanks to the NagY antiterminator and its targets expression.


Introduction
According to the Genome Taxonomy Database Enterococci are composed of 13 genera (Enterococcus, Enterococcus-A to J, Melissococcus, and Tetragenococcus). They belong to the Enteroccocaceae family and to the order Lactobacillales, with other families of medical importance such as Streptococcaceae (Ludwig et al., 2009;García-Solache and Rice, 2019). Enterococci are Gram-positive facultative anaerobic bacteria, commonly found in mammal's intestinal microbiota, and also major health careassociated infection pathogens, especially Enterococcus faecalis and Enterococcus faecium (Hendrickx et al., 2009;Fiore et al., 2019). As well-documented pathogens, enterococci are associated with various clinical manifestations including urinary tract infections, bacteremia, or endocarditis and they can also be recovered from cultures of intra-abdominal, pelvic, and soft tissue infections (Agudelo Higuita and Huycke, 2014). Enterococcus faecalis is reported to be responsible for 10% of all infective endocarditis cases (Fernández-Hidalgo et al., 2020;Barnes et al., 2021), and Enterococcus spp. is considered as the third causative agent of these infections in Europe (Habib et al., 2019).
Enterococcus faecalis and E. faecium present numerous intrinsic and acquired resistances to antibiotics, that makes treatment of enterococcal infections particularly challenging (Agudelo Higuita and Huycke, 2014;Faron et al., 2016). Indeed, enterococci are intrinsically resistant to β-lactams, aminoglycosides, or lincosamides, and they can acquire resistance to antibiotics of all classes that have so far been introduced for therapy, like lipopeptides, cyclines, or glycopeptides (Fiore et al., 2019). These characteristics, which distinguish them from their ancestors, allow them to persist in the modern hospital environment (Lebreton et al., 2017). At the beginning of the 21st Century, the rapid increase of vancomycin resistance in enterococci raised alarms because this antibiotic was formerly designated as "last resort" for the treatment of Gram-positive multidrug-resistant bacteria (Faron et al., 2016;Fiore et al., 2019).
The Enterococcus spp. transition from commensal to pathogen is observed as a result of overgrowth in the colon, which increases the risk by simple numerical probability of dissemination into the bloodstream and in other sites, especially in susceptible hosts (Fiore et al., 2019;Kao and Kline, 2019). The ability to obtain nutrients within the competitive environment of the gut is also an important aspect of colonization efficiency (Ramsey et al., 2014;Fiore et al., 2019). The E. faecalis metabolism undergoes significant expression changes even more important than those observed for virulence factor genes during an infection of mouse peritoneum (Muller et al., 2015). Transcriptomic studies also showed that resistance abilities during mice infection or when cells are exposed to stress during colonization are more dependent on metabolism or stress response genes than virulence traits (Muller et al., 2015;Salze et al., 2020a). In this context, enzyme like hyaluronidase was poorly investigated in enterococci, even though most Grampositive pathogenic bacteria produce these proteins in their survival and infection strategies. The hyaluronidases are capable of cutting β-1,4 glycosidic bonds between the N-acetylglucosamine (NAG) and the D-glucuronic acid that composed the hyaluronic acid (HA; [-D-glucuronic acid-β1,3-N-acetyl-Dglucosamine-β1,4-]n; Stern and Jedrzejas, 2006). HA is the most widespread glycosaminoglycans (GAGs) with chondroitin and heparin, which are found as components of the extracellular matrix (ECM; Theocharis et al., 2016). As a monosaccharide capable of β-binding to another monosaccharide, the NAG is considered as a β-glucoside. Therefore, the degradation of HA by hyaluronidases can provide two advantages: firstly, they can facilitate bacterial spread by degradation of HA composing the host ECM, and secondly, they can provide a source for their carbon and energy requirements (Stern and Jedrzejas, 2006;Kawai et al., 2018).
To metabolize the nutrients, the bacteria must first internalize them by different systems, especially the phosphoenolpyruvatesugar phosphotransferase system (PTS) involved in carbohydrates uptake by using the energy derived from glycolysis-produced phosphoenolpyruvate (PEP; Kundig et al., 1964;Deutscher et al., 2006;Galinier and Deutscher, 2017). These systems are usually composed of several proteins: enzyme I (EI), heat-stable protein (HPr), and enzyme II (EII) that are activated successively by phosphorylation (Kundig et al., 1964;Deutscher et al., 2006). The EII is composed of the EIIA, EIIB, and EIIC (occasionally EIID) subunits, which can be combined and are specific of one substrate or small group of closely related carbohydrates. HPr is also involved in other regulatory mechanisms such as the carbon catabolite repression (CCR) for the orderly utilization of secondary carbon sources, or in the activity control of proteins containing PTS regulatory domains (PRDs), such as BglG/SacY family antiterminators (Prasad and Schaefler, 1974;Stülke, 2002;Görke, 2003;Deutscher et al., 2006;Görke and Stülke, 2008). Although the Bgl system was firstly described in Escherichia coli, such systems are highly conserved in bacteria and are also present in Gram-positive bacteria, like Bacillus subtilis with SacY involved in sucrose utilization (Steinmetz et al., 1988;Arnaud et al., 1996;Stülke et al., 1998;Tortosa et al., 2001). In B. subtilis, SacY is encoded in an operon with the sacX gene (Tortosa and Le Coq, 1995;Deutscher et al., 2006). SacX protein is a sucrose specific EIIBC, which has a role in SacY activation in absence of the carbohydrate (Tortosa et al., 2001). In the presence of sucrose, SacY binds to a specific and conserved sequence called ribonucleic antiterminator (RAT) located in the RNA 5′untranslated region (5′UTR) of the sacXY operon (Figure 1; Aymerich and Steinmetz, 1992;Tortosa et al., 2001;Clerte et al., 2013). This binding can lead to the opening of the terminator hairpin and therefore make the transcription terminator ineffective and allows transcription of the specific genes that are not usually transcribed (Aymerich and Steinmetz, 1992;Tortosa et al., 2001;Clerte et al., 2013).
Herein, we investigate the role of a BglG/SacY antiterminator homolog as a link between the metabolism and the opportunistic features of E. faecalis. An interesting aspect is that the β-glucosides metabolism was shown to be induced during infection (Muller Frontiers al., 2015), so we studied the regulation of this metabolism in E. faecalis by the ef1515-ef1516 operon encoding a BglG/SacY-like antiterminator (NagY) and a NAG PTS transporter (NagE; Keffeler et al., 2021b). We analyzed the autoregulation mechanism of NagY, and its action on the expression of a hyaluronidase HylA, identified as a MSCRAMM (Microbial Surface Components Recognizing Adhesive Matrix Molecules). A phylogenomic approach was also used to complete this study in order to elucidate the processes at work in the evolution of the nagY, nagE, and hylA genes among a set of representative Enterococcaceae species and a large sample of E. faecalis strains. This comparative genomics approach allowed us to identify conserved RAT-like motifs involved in the regulation of the expression of these genes.

Molecular biology techniques
Primers used in this study are listed in Supplementary Table 2 Schematic representation of SacY mechanism in Bacillus subtilis. SacX EIIBC enzyme is phosphorylated by HPr from the phosphotransferase system (PTS) pathway, and regulation of sacX-sacY operon occurs depending on the presence of sucrose. In the absence of the specific carbohydrate (left pannel), SacY is phosphorylated and inactivated by SacX. The sacX-sacY transcription is initiated by RNA polymerase (RNAP) but stops by the closing of the terminator hairpin (indicated in red). In the presence of this carbohydrate (right panel), SacY binds its own mRNA on the untranslated region 5'sacX, and more specifically on the ribonucleic antiterminator sequence (RAT; indicated in green). This interaction promotes the antitermination hairpin (indicated in blue), which prevents the closure of the terminator hairpin, and makes the transcription terminator present on 5'sacX ineffective (anti-termination). Sucrose is then phosphorylated by SacX during its import into the cell and metabolized (Clerte et al., 2013).
Frontiers in Microbiology 04 frontiersin.org primers), and SP3 (sequencing) primers and poly-G oligonucleotides. In the case of uncertainty, poly-A tailing was also used. Purifications of PCR products were performed with NucleoSpin Gel and PCR Clean-up kit (Macherey-Nagel, Düren, Germany), and plasmid extractions were achieved using NucleoSpin Plasmid kit (Macherey-Nagel). Digestions were generated using restriction enzymes (Promega, Madison, Wisconsin, United States) and ligations with T4 DNA Ligase (New England BioLabs).  (Thurlow et al., 2009) was amplified with pLT06_1_bis and pLT06_2 primers, and flanking regions of the region to be deleted were amplified by PCR using oligonucleotides 1 and 2 for the upstream fragment, and oligonucleotides 3 and 4 for downstream fragment (Supplementary Table 2). Primers 1 and 4 have overlapping tails compatible with pLT06_1_bis and pLT06_2 primers, and primers 2 and 3 have overlapping tails compatible with each other. Cloning was performed using the in vivo recombination method (Huang et al., 2017). Deletion was obtained by double crossing over, as previously described (Thurlow et al., 2009), and was checked by PCR using primers 5 and 6 (Supplementary Table 2). Gene deletion and absence of point mutations susceptible to change phenotypes of deleted strains were checked by whole genome sequencing as described in Supplementary Material and Methods, and a summary of the variant detection analysis is listed in Supplementary Table 3.

Total RNA extraction
Cultures of 10 ml were performed at 37°C in carbon depleted medium cdM17 (La Carbona et al., 2007) supplemented with the appropriate sugar to OD 600 0.5. Cells were pelleted and lysed using a FastPrep device (MP Biomedicals, Illkirch Graffenstaden, France). RNA extraction and purification were achieved with TRIzol Reagent (ThermoFisher) and chloroform/isoamyl alcohol separation before using Direct-Zol RNA Miniprep kit following the manufacturer's recommendations (Zymo-Research, Irvine, Californie, United States). RNAs were quantified using Nanodrop™ 2000 (ThermoFisher) and their quality was checked by electrophoresis.

RT-PCR and RT-qPCR
Reverse Transcription for RT-PCR and RT-qPCR assays were performed using QuantiTect Reverse Transcription kit (Qiagen) with L/R oligonucleotides and random primers, respectively (Supplementary Table 2). The GoTaq qPCR Master Mix (Promega) was used for qPCR, as well as the C1000™ Thermal Cycler (Bio-Rad) apparatus, using the following conditions: 95°C for 3 min, and 40 cycles at 95°C for 15 sec, 60°C for 1 min. Normalization was performed using the gyrA reference gene. Standard curves of each gene and qPCR efficiency were obtained using genomic DNA of the E. faecalis V19 strain.
2.6. In vitro production of RNA RNA synthesis was achieved with the pTOPO plasmid and in vitro synthesis. DNA regions of interest were amplified by PCR using primers containing tails overlapping with pTOPO: topo85_ FP1 and topo85_RP1 for 5'nagY, topo5'3023_FP1/topo5'3023_ RP1 for 5'hylA, and topo65_FP1 and topo65_RP1 for SRC65 (Supplementary Table 2). Plasmid was amplified with topo_FP2 and topo_RP2 primers. PCR products were inserted into the plasmid with NEBuilder HiFi DNA Assembly Cloning Kit (New England BioLabs), and products used to transform E. coli TOP10. RNA production was performed on the resulting plasmid, linearized with the SpeI restriction endonuclease, using MAXIScript™ T7 in vitro Transcription Kit (Invitrogen, Carlsbad, California, United States). Unincorporated nucleotides were eliminated by ammonium acetate/ethanol precipitation, as recommended on the kit protocol, and RNA were quantified with Nanodrop™ 2000 (ThermoFisher).

Synthesis and purification of recombinant NagY protein
The nagY gene was amplified with primers ef1515_ pQE70_SphI and ef1515_pQE70_BglII carrying restriction sites (Supplementary Table 2). pQE70 plasmid and PCR product were digested with SphI and BglII enzymes, ligated together, and used to transform E. coli M15 pRep4. The bacteria obtained were grown at 37°C with agitation in Terrific Broth medium supplemented with kanamycin and ampicillin until OD 600 0.5. Transcription induction was triggered with 0.5 mM isopropyl β-D-1thiogalactopyranoside (IPTG) during 4 h at 37°C under agitation. NagY protein was purified with Protino Ni-NTA Agarose kit according to the manufacturer's instructions (Macherey-Nagel), and desalted with PD10 columns (GE Healthcare, Chicago, Illinois, United States). Protein concentration was determined with the BCA test (ThermoFisher Pierce), and its purity checked on 12.5% SDS-PAGE and by mass spectrometry.

MicroScale thermophoresis (MST)
Recombinant NagY protein was labeled using the His-Tag Labeling Kit RED-tris-NTA 2 nd generation Monolith Frontiers in Microbiology 05 frontiersin.org (Nanotemper Technologies, München, Germany) following the manufacturer's recommendations, and diluted in ES-Buffer (10 mM Tris pH8.0, 40 mM NaCl, 10 mM KCl, 1 mM MgCl 2 , 0.05% Tween-80) at a final concentration of 25 nM. Before the assay, RNAs were heated for 5 min at 70°C and slowly cooled down at room temperature to allow a proper formation of secondary structures. A series of 1:1 dilution of RNAs were prepared in order to obtain a RNAs concentration ranged from 0.015 nM to 492 nM (16 points). Then each tube prepared with this 16 RNAs different concentration is mixed with the labeled protein (1:1), and filled into capillaries and introduced into the Monolith NT.115 Pico instrument (NanoTemper Technologies). Data of at least three independently pipetted measurements were analyzed (MO.Affinity Analysis software version 1.5.41, NanoTemper Technologies). The data were fitted using the law of mass action from GraphPad Prism version 5, and MicroScale thermophoresis (MST) figures were generated using MO.Affinity Analysis.

Biofilm study
Overnight cultures in GM17 of E. faecalis were adjusted to OD 600 of 0.2 in fresh M17 supplemented with 2% glucose. One hundred microliters of the bacterial suspension were inoculated into CytoOne polystyrene microwells plate coated with 1 μg/ml of hyaluronic acid (Sigma-Aldrich), chondroitin sulfate (Carl Roth, Karlsruhe, Germany), or heparin sodium (ThermoFisher). After 24 or 48 h of incubation at 37°C, the plates were washed with 0.9% NaCl to remove unbound bacteria. Each well was then stained with 0.1% (wt/vol) crystal violet for 15 min at room temperature. Wells were then rinsed two times with 0.9% NaCl. Adherent cells were dissolved in 30% acetic acid, and the OD 550 was measured using a microplate reader Nano Tecan (Life Science). At least three experiments were performed for each condition.

Detection of glycosaminoglycans-degrading activity
Detection of GAGs-degrading activity were performed as previously described (Kawai et al., 2018), with the following modifications. Overnight cultured cells were centrifuged at 4,500 rpm for 10 min, washed with 1 ml of 0.9% NaCl or GM17, and resuspended in the saline or GM17. The volume (X μl) of the saline or GM17 was calculated by the following formula: X = 200 × OD 600 . Cell suspension was then spotted at the center of the GAG minimal plate [0.2% GAG (hyaluronic acid, chondroitin sulfate, or heparin sodium), 0.1% yeast extract, 0.1% Na 2 HPO 4 , 0.1% KH 2 PO 4 , 0.01% MgSO 4 ·7H 2 O, and 1.5% agar] with BSA at 1% and cultured at 37°C for 7 days. After cell growth, 1 ml of 2 M acetic acid was added to form a complex with the remaining GAG and BSA as white precipitates.

Virulence study on a Galleria mellonella model
Infection assays were performed on Galleria mellonella larva as previously described (Benachour et al., 2012). Bacteria were inoculated with a dose of 2 × 10 6 CFU to 15 caterpillars per strain for each experiment. At least three experiments were performed for each condition. G. mellonella survival was followed from 16 h post-infection and during 24 h.
2.12. Phylogenomic analyses 2.12.1. Genome samples, annotation, and quality assessment Genomes were retrieved from the NCBI website (https://ftp. ncbi.nlm.nih.gov/genomes/, last accessed November 17,2021). DNA sequences of 2064 E. faecalis genomes were extracted and the GTDB database was used to select a set of 81 genomes of Enterococcaceae species, annotated as representatives (https://gtdb. ecogenomic.org/; Parks et al., 2022), and including the ATCC 19433 strain genome for E. faecalis (Supplementary Table 4). All genomes were annotated with Prokka (version 1.14.6; Seemann, 2014), and protein domains were annotated with the hmmscan program of the HMMER suite (version HMMER 3.1b2; Eddy, 2011). Three filters were used to ensure the quality of the genomes selected. First, we identified genomes for which the number of coding sequences predicted by Prokka is outlier. Then, the quality assessment of E. faecalis genomes was performed using CheckM (version 1.1.3; Parks et al., 2022). Finally, the Mash software (version 2.3; Ondov et al., 2016) was used to identify genomes incorrectly classified as E. faecalis. The intersection of the lists of genomes retained leads to a set of 1,949 E. faecalis strains.

Pan-genomes
The core genome was used for inferring phylogenetic trees and the accessory genome to study the adaptation of different strains to biotope (Tettelin et al., 2005). To analyze the pan-genomes, phylogeny enhanced pipeline for pan-genome (PEPPAN) has been used for the Enterococcus species, and  (Minh et al., 2020) with the selected model Q.
LG + F + R8, and branch support values were determined using ultra-fast bootstrap approximation (ufboot) and SH test (alrt) with 1,000 replicates. The tree was rooted on the Enterococcus sp. from Marseille-P2817 strains (Supplementary Figure 1). The E. faecalis strain tree was calculated from the concatenated alignments of the OG clusters identified by Panaroo and the tree constructed as described for Enterococcacceae but with IQ-TREE fast version (Minh et al., 2020)

NagY, NagE, and HylA orthologs
To identify the set of Enterococcaceae proteins that were orthologous to NagY, NagE, and HylA from E. faecalis V583, we used the OG clusters calculated with the PEPPAN pipeline and the genetic context conservation (Supplementary Table 5). The genetic context of the genes is obtained by extracting the gene positions from the Prokka GFF files. The annotation files were prepared with in-house scripts and the trees were annotated and visualized with the online tool Interactive Tree Of Life (iTOL v6; https://itol.embl.de; Letunic and Bork, 2019). The Pfam domain annotation of the proteins was predicted with the hmmscan program (Eddy, 2011). GeneRax (version 2.0.4; Morel et al., 2020) was then used to infer rooted family tree directly from the multiple sequence alignment and a rooted species tree. Each dataset was aligned with mafft and the amino acid substitution model that best fit the data were selected with modeltest-ng (v0.1.7; Darriba et al., 2020). In addition to the protein family root tree, the program returns statistics on the events predicted by the reconciliation (speciation, speciation+loss, duplication, and transfer). For the identification of OGs in E. faecalis strains, the V583 protein sequences were used as query with mmseqs2 (Steinegger and Söding, 2017) on the entire E. faecalis proteomes annotated by Prokka. Next, we identified the Panaroo OGs to which significant hits belong and extracted all proteins from each OG.

Identification of conserved motifs
The conserved motifs on Enterococcaceae 5'nagY were identified with MEME 5.3.0 (Bailey and Elkan, 1994). A secondary structure search was performed with RNAfold 2.4.14 from the Vienna RNA package (Lorenz et al., 2011) and validated with rLocARNA 2.8.ORC8 software  that simultaneously folds and aligns the input sequences. The alignment obtained with rLocARNA was used to construct a covariance model that combines primary sequence conservation and secondary RNA structure (cmbuild and cmcalibrate from the Infernal package 1.1.4; Nawrocki and Eddy, 2013). The cmsearch (Infernal package 1.1.4) was used to search for these patterns in the DNA sequences.

Identification and phylogenetic distribution of the nagY and nagE orthologs in Enterococcaceae
The ef1515 gene in E. faecalis V583 is annotated as encoding a BglG/SacY family antiterminator, and we renamed it nagY in reference to sacY, and its following gene nagE (Mao et al., 2009). The EF1515 (EF_RS07320 in the new nomenclature) protein presents 36% identity with SacY from B. subtilis, and 31% amino acid identity with BglG from E. coli (blastP alignment). This gene is followed by the ef1516 (EF_RS07325) gene (Figure 2), recently renamed nagE on V583 genome, encoding a NAG specific EIICBA PTS transporter (Keffeler et al., 2021b). Thus, the NagY and NagE proteins belong to multigene families. In order to unambiguously identify the orthologous genes encoding these proteins in the Enterococcaceae, we performed a phylogenomic analysis of these families.
The PEPPAN pipeline (Zhou et al., 2020) has been used to analyze the pan-genomes of the 81 Enterococcaceae reference genomes. E. faecalis V583 nagY and nagE belong to two OG clusters composed of 47 and 54 members. Both genes appear to be present in the majority of Enterococcaceae studied (Figure 3). They are absent in the genera Enterococcus_J, _H, _E, and _G and weakly represented in the genus Enterococcus_D and in Tetragenococcus. Note that the outgroup position of Enterococcus_J and Enterococcus_H suggests that these genes were absent in the last common ancestor (LCA) of Enterococcaceae. Two nagE paralogs are present in E. thailandicus DSM 21767 and a tblastn search, with the candidate genes as query, identifies two nagE-like sequences in two genomes (E. sp. 9E7_DIV0242 and E. sp. AS17jrsBPGB_10) that were not annotated by Prokka (Figure 3). Chromosomal neighborhood analysis of the nagY genes reveals, in all cases, the presence of a downstream nagE gene. However, in eight genomes, a nagE gene is present without its nagY partner. Five genomes belong to a subtree composed of E. sp. 10A9_DIV0425 and four E. mundtti genomes (Figure 3) suggesting that the nagY gene may have been lost in their LCA. One of the two E. thailandicus DSM 21767 paralogs does not have a nagY gene in its chromosomal neighborhood. Note that nagE genes encode proteins with the EIICBA architecture as PtsG of B. subtilis (and B. cereus) while BglF from E. coli has EIIBCA domain order.
In addition to gene losses, GeneRax predicted that horizontal gene transfers (HGT) may have occurred, with a higher frequency for nagE (17 HGTs for 54 genes) than for nagY (7 HGTs for 47 genes; Supplementary Figures 2A,B). Some HGTs may have occurred between genomes belonging to different genera, as revealed by the splitting of these genera on the protein trees. It should be noted that no conservation of gene neighborhoods was found in the genomes studied. Only a majority of strains of the genus Enterococcus present a conserved genetic context,  Figure 2).
To determine the extent to which the orthologous gene pair nagY-nagE is present in E. faecalis strains, we searched the 1949 genomes for orthologs of both gene products. The NagY and NagE proteins were identified in 99.84 and 99.59% of the genomes, respectively, and both are present in 99.38%. Missing genes are due to incomplete genome assemblies.

Characterization of the nagY-nagE operon and its regulation
We experimentally characterized the nagY-nagE operon (represented in Figure 2) by confirming the co-transcription of these two genes by RT-PCR (Supplementary Figure 3). The induction of the nagY-nagE expression in presence of NAG was then revealed by RT-qPCR targeting nagE (Figure 4). This gene is overexpressed in WT strain in presence of NAG as sole carbon source, with a fold-change (FC) of 16.6-fold (±2.15) compared to glucose conditions (p < 0.001), and this induction is almost completely abolished in the ΔnagY strain (FC = 3.1 ± 1.55, p < 0.001).
To investigate the role of NagY in its own operon regulation, we firstly identified the transcription start site by 5′RACE-PCR assay. Thus, consistent with previous studies (Innocenti et al., 2015;Muller et al., 2015;Michaux et al., 2020), the starting base of the RNA was confirmed to be located 172 pb before the predicted initiation codon (Figure 2; Supplementary Figure 4A), showing the existence of a long 5′UTR (named 5′nagY). The search of RAT sequence based on the consensus defined in preceding work in E. coli and B. subtilis (Aymerich and Steinmetz, 1992;Tortosa et al., 2001;Gordon et al., 2015) allowed the identification in the 5′nagY of an imperfect inverted repeat with a low sequence conservation with the consensus sequence ( Figure 2). A search performed with MEME (Bailey and Elkan, 1994) in the upstream regions of nagY orthologs reveals the presence of two conserved motifs. The first motif covers the region overlapping the putative RAT sequence identified above in E. faecalis V583. The second motif, located downstream of the first motif, is less well conserved but is characterized by a terminal T-rich region (Supplementary Figure 5). These motifs are conserved with their relative positions upstream of the 46 nagY sequences of Enterococcaceae, with the exception of E. saccharolyticus (Figure 3). Alignment of the first motif with rLocARNA ) reveals a conserved stem loop with a two-nucleotide bulge (Supplementary Figure 5A). The stem bases have undergone a large number of compensatory mutations to maintain this secondary structure. The region bounded by the two conserved MEME motifs was extracted from the 5'nagY for the different genomes. The rLocARNA predicts, in all sequences, the presence of a large stem loop of variable length that ends in a T/U-rich region, a structure typical of an independent rho terminator (Supplementary Figure 5B). The foot of this structure overlaps the end of the RAT region (common consensus sequence: GCRUGGA). These two structures are therefore mutually exclusive (Supplementary Figure 5C). This competition between the two structures is typical of what has been observed for BglG/ SacY antiterminators. A covariance model was constructed with the alignment obtained with rLocARNA and was used to identify the RAT-terminator motif with a high specificity in genomic sequences (Figure 3).
To confirm that the nagY-nagE transcription-antitermination mechanism is similar to the SacY model in B. subtilis, the direct Representation of the nagY-nagE operon and 5'nagY sequence. Transcription start (+1), ribosome binding, and translation start (in bold) sites are indicated. The sequence deleted in the Δ5'nagY mutant is in capital letters. The RAT-like sequences are underlined in green, and the terminator and the antiterminator identified are overligned by inverted arrows in red and blue, respectively.
Frontiers in Microbiology 08 frontiersin.org interaction between NagY and 5'nagY was studied by MST. The purified protein was incubated in presence of the in vitro produced 5'nagY RNA. As shown in Figure 5, we observed a dose-response binding of NagY on 5'nagY, with a Kd of 4.18 nM (±0.42 nM). The SRC65 sRNA (Shioya et al., 2011;Salze et al., 2020b) was used as negative control and did not show any interaction with NagY. These results indicate that NagY has a high affinity for its 5′UTR and regulates both NagE and its own expression by its binding on RNA 5'nagY. Consequently, binding of NagY to RATs can lead to the opening of the hairpin base and therefore make the transcription terminator ineffective.
To confirm that these sequences are involved in the nagY-nagE NAG-dependent induction, a deletion of 5'nagY region overlapping both structures was constructed in the WT strain Distribution of NagY, NagE, and HylA protein families in Enterococcaceae genomes. First panel on the left: The Enterococcaceae species tree inferred with the 526 orthologous gene (OGs) clusters present in at least 95% of the genomes. The tree was rooted with Marseille-P2817 strains (Supplementary Material and Methods). The 13 family genera of Enterococcaceae described in the GTDB and the four phylogenetic groups described in Lebreton et al. (2017) were reported. The tree is perfectly resolved except for one branch of the Tetragenococcus subtree colored in red (13.1/54, ufboot/alrt supports). Second panel in the middle-left ("Orthologs"): distribution of orthologous proteins to NagY, NagE, and short and long Lyase_8 forms. Lightened colors indicate the presence of pseudogenes. Third panel in the middle-right ("RATs"): occurrence of RAT sequence in front of nagY and hylA genes. Last column on the right ("Metadata"): metadata available in GenBank files of the analyzed genomes.
Frontiers in Microbiology 09 frontiersin.org (Figure 2), and the expression of nagE in this mutant was determined by RT-qPCR ( Figure 4). No NAG-dependent induction was observed when we compared NAG to glucose culture conditions, but nagE is deregulated whatever culture conditions are, with FC of 33.2 and 34.8 in presence of glucose and NAG, respectively (Supplementary Figure 6). Considering the role of NagE in the NAG transport, the operon could potentially be regulated by catabolic repression. The expression of the operon was also followed in the presence of glucose and NAG, but no difference of expression was observed compared to the condition with NAG only (Supplementary Figure 6). These results establish that the nagY-nagE operon expression is not under the control of the catabolic repression.

Identification of a new NagY target gene encoding a polysaccharide lyase
As regulator binding on nucleic acid, NagY could potentially regulate other genes expression by recognizing a conserved sequence. To identify potential NagY targets, we searched for RAT/terminator motif with cmsearch software (Infernal package; Nawrocki and Eddy, 2013) in the E. faecalis V583 genome. We obtained two hits, the highest upstream of the nagY gene, and the second upstream of the ef3023 (ef_rs14340) monocositronic gene, named hylA, which shares 88% identity with the nagY RAT-like sequence. This suggests that hylA possesses a 5'UTR (named 5'hylA) on which NagY could potentially bind to regulate the expression of this gene. We confirmed the existence of a 5'hylA of 193 pb long by 5'RACE-PCR (Supplementary Figure 4B), and the interaction between NagY and this RNA was studied by MST ( Figure 5). This assay highlighted a binding of the protein on 5'hylA RNA, with a lower affinity than with 5'nagY (7.46 ± 0.63 nM). Moreover, RT-qPCR assays showed that the expression of hylA depends on the presence of the nagY gene ( Figure 4). These results are compliant with the hypothesis that NagY also regulates the expression of hylA, suggesting that nagY, nagE, and hylA belong to the same regulon and potentially the same carbohydrate consumption pathway.
HylA was identified as a cell-wall anchored protein, annotated as a polysaccharide lyase 8 [Lyase_8_N (PF08124), Lyase_8 (PF02278), and Lyase_8_C (PF02884.17) domains, Figure 6]: this group of enzymes targets uronic acid-containing polysaccharides such as some GAGs (hyaluronate, chondroitin, or heparin) that are components of the ECM (Sillanpaa et al., 2004(Sillanpaa et al., , 2009Lombard et al., 2014). To our knowledge, the function and substrate of E. faecalis HylA are unknown, although it is annotated as hyaluronidase in KEGG database (Kanehisa et al., 2017) and was identified as a MSCRAMM family member mostly extracytoplasmic (Sillanpaa et al., 2004(Sillanpaa et al., , 2009. In addition to a signal peptide and the lyase regions, HylA possesses other domains: (i) a F5/8 type C domain (discoidin domain) that is a major domain of many blood coagulation factors (F5_F8_type_C PF00754), (ii) a bacterial Ig-like domain (Big_2 PF02368) found in bacterial cell-adhesion molecule, mediating the intimate bacterial host-cell interaction (Kelly et al., 1999), (iii) FIVAR domains (Found In Various Architectures PF07554) mostly associated to binding domains in cell wall associated proteins, and (iv) a LPXTG cell wall anchor motif (Gram_pos_anchor PF00746) presents in virulence factors which are produced by Gram positive pathogens.

Identification and phylogenetic distribution of the hylA homologs in Enterococcaceae
We used the two largest conserved domains of the protein to identify homologous sequences in Enterococcaceae (PF08124, Study of nagE and hylA induction of expression in presence of NAG compared to glucose condition. The nagE and hylA gene expression in WT (white), ΔnagY (gray), and Δ5'nagY (black) strains were revealed by RT-qPCR, with RNA purified from culture in presence of NAG as sole carbon source compared to glucose condition. Error bars represent the standard error of triplicate measurements. nd: not determined. Study of the interaction of NagY on the 5'UTR RNA target genes. MST dose response curves for interaction between NagY labeled protein and 5'nagY (black circles), 5'hylA (empty circles) RNA (ligand), and SRC65 sRNA (used as negative control; black triangles). Error bars represent the standard error of triplicate measurements.  (Figure 3). The long Lyase_8 protein found in the genomes of E. faecalis V583 and OG1RF strains is also found in the genomes of E. hirae ATCC 9790 (Figure 3). GeneRax predicted that the hylA sequences of E. hirae and E. faecalis are originated from HGTs, but presumably from genomes that are not sampled in our study (Supplementary Figure 2C). cmsearch of the upstream region of the E. hirae hylA gene reveals the presence of the RAT-containing motif with the T-rich region just downstream, as observed in the 5'nagY sequence. This conservation suggests similar regulation of hylA by NagY in E. hirae and E. faecalis strains. It also has to be noted that RAT sequence in the 5'hylA is conserved only in the presence of the long Lyase_8 (Figure 3).

Frontiers in
To better understand the origin of HylA in E. faecalis, we ran a blastP on the NCBI website with the EF3023 sequence. Even though many sequences are from E. hirae, sequences from Lacticaseibacillus genomes and from different species of Listeria, Staphylococcus, and Streptococcus are found. If some of them are partial, others like those of S. agalactiae or L. monocytogenes have a domain organization similar to those of E. faecalis sequences. The high sequence conservation between these distant species suggests recent HGTs.
To determine the extent to which the orthologous gene hylA is present in E. faecalis strains, we searched for occurrences of hylA in the 1949 E. faecalis proteomes. We identified 1,520 occurrences among which 458 sequences are partial with a length of their gene shorter than 3,000 nt (Figure 7). The hylA genes appears to be well distributed in E. faecalis strains, however we can observe its absence in some closely related species forming subtrees on the species tree, suggesting that it has been lost in their LCA. Similarly, we can note that partial sequences are found in genomes closely related on the species tree, which could indicate pseudogenization of these genes for subsets of related genomes. These hylA genes encode HylA protein but with a variable number of FIVAR domain(s) (from 1 to 9, but centered on four copies). The short version of the protein is present in 1,243 E. faecalis proteomes with 26 genes fissions.

Characterization of HylA
To determine HylA functions, the ΔhylA mutant was constructed and characterized. We observed that this mutant was not affected in its growth in the presence of NAG as sole carbon source (Supplementary Material and Methods; Supplementary Figure 7). HylA has putative hyaluronidase domain, and hyaluronic acid is a polymer made up of alternating NAG and glucuronic acid residues linked by glycosidic bonds (Hynes and Walton, 2000), that could make the functional link between hylA and nagY-nagE operon. As shown on Figure 8A, ΔnagY, and ΔhylA cannot degrade this substrate compared to the WT strain, as well as heparin sodium and chondroitin sulfate, confirming that the encoding proteins are involved in the use of these GAGs.
Given the adhesin domains identified in the HylA protein, biofilm formation was assessed using the microtiter plate assay on a coating of hyaluronic acid, chondroitin sulfate, or heparin sodium ( Figure 8B; Supplementary Figure 8). While no difference was observed after 24 h, a significant 2.3 and 3.2-fold decrease in crystal violet staining was observed after 48 h for the ΔnagY and ΔhylA mutants on microtiter plates coated with hyaluronic acid, respectively ( Figure 8B), and 1.9/1.7 and 1.5/1.6-fold decrease on plates coated with heparin or chondroitin compared to the WT strain (p < 0.0001; Supplementary Figure 8).
To investigate the role of NagY and HylA in virulence in vivo, we monitored G. mellonella larvae survival infected by WT, the two mutant strains and the 5'nagY mutant strain in which NagY is deregulated and overexpressed whatever conditions are  Figure 9). However, larvae infected by ΔhylA and ∆5'nagY strains showed a significant increase in survival relative to the parental WT strain (p < 0.001 and p < 0.01 respectively, Figure 9).

Discussion
In condition of equilibrium of the gastro-intestinal microbiota, E. faecalis is a subdominant species, but during dysbiosis (induced by antibiotic treatments for example), it overgrows and cross the intestinal barrier, giving rise to intestinal translocation and  infection (Ubeda et al., 2010;Archambaud et al., 2019). During colonization, E. faecalis has to use specific mechanisms to adapt to a new environment and find out carbohydrates like mono-and polysaccharides or mucin components. NAG is one of the main nutrients used by bacteria during colonization (Chang et al., 2004). This sugar is found in large amount in the gastro-intestinal tract and a component of GAGs that made up the ECM. Our study of 1949 E. faecalis strains revealed that the nagY and nagE were present in 99.84 and 99.59% of the genomes, respectively, and both are present in 99.38%. The absence of genes is likely to be due to incomplete genome assemblies. The analysis of a set of 81 reference genomes of Enterococcaceae shown that this gene pair is conserved in 47 genomes ( Figure 3). Consequently, nagY-nagE is highly conserved in E. faecalis, and were lost or transferred frequently during the evolution of Enterococcaceae, illustrating the adaptation of genomes to the presence of NAG in their environment.
We observed that this operon is autoregulated thanks to NagY and its binding on 5'nagY RNA, implicating a cis-acting regulatory element containing a small secondary structure overlapping a rho-independent terminator. This terminator is conserved upstream of nagY in the Enterococcaceae as evidenced by the presence of compensatory mutations that preserve the structures (Figure 3; Supplementary Figure 5). Our results suggest that this first structure may interfere with the formation of the transcriptional terminator and therefore prevents early transcription termination. This structure was identified as the RAT sequences: the mechanism of NagY regulation in E. faecalis, and most likely in other Enterococcaceae, is consequently similar to the admitted model in B. subtilis and E. coli (Aymerich and Steinmetz, 1992;Amster-Choder, 2005). Transcription of nagY-nagE is constitutively initiated but stops at the terminator structure upstream of the coding region unless β-glucosides are present (Figure 1). Thus, the NagE PTS transporter allows the NagY antiterminator to sense carbohydrate source in the environment (Tortosa et al., 2001). NAG is then phosphorylated by NagE during its import into the cell (Keffeler et al., 2021b) and is directly used in glycolysis and metabolized. We showed that nagY-nagE is not Frontiers in Microbiology 13 frontiersin.org submitted to catabolic repression. The consensus cre sequence WTGWAARCGYWWWC (Suárez et al., 2011) is indeed modified by the insertion of a nucleotide ATGAATAGCGTTTTC that probably interferes with catabolic repression. It has to be noted that the transcription unit controlled by SacY is also not subject to CCR (Stülke et al., 1998). Moreover, it was observed that nagE induction of expression is weak but still present in the absence of nagY when the strain is cultivated with NAG as sole carbon source (Figure 4; Supplementary Figure 6), suggesting another level of regulation. Whereas nagY gene in E. faecalis does not appear to be directly involved in pathogenesis in our caterpillar model with the ∆nagY strain, the observation of a significant increase in survival following infection with the ∆5'nagY (when NagY is constitutively expressed) highlights the involvement of the antiterminator in virulence (Figure 9). Its homolog in L. monocytogenes was also demonstrated to be a virulence factor (Abdelhamed et al., 2019). NagY is consequently supposed to be associated to pathogenesis, even if no clear correlation between the presence of nagY gene and clinical isolates origin was found ( Figure 3). NagY not only regulates the expression of its own operon, but also the hylA gene, encoding a hyaluronate lyase HylA enable to provide nutrient source from GAGs. Consequently, this hyaluronidase represents an advantage for nutrient recovery in host and infection process of E. faecalis. However, this activity is very low in our culture condition since no growth can be observed with GAGs like hyaluronic acid, chondroitin, or heparin as sole carbon source (Supplementary Figure 7), but GAGs degradation can be observed ( Figure 8A). Previous report also showed that E. faecalis slightly degrades heparin (Kawai et al., 2018). Indeed, enterococci show little ability to degrade GAG, and use preferentially unsaturated GAG disaccharides produced by other bacteria in human gut microbiota (Kawai et al., 2018). Crossfeeding by anaerobes is by the way considered to be the major actor of polysaccharide degradation: the ability of enterococci to utilize such nutrients in vivo would obviously be dependent on their potential to compete effectively for them with members of the microbiota. In this context, HylA could be used as a backup to favorize E. faecalis survival and the competition with other microorganisms in gastro-intestinal microbiota. Contrary to nagY-nagE operon, it was shown that hylA gene expression is under the control of the RpoN sigma factor and CCR, suggesting a multifactorial regulation of this gene (Keffeler et al., 2021a). Thus, the nagY, nagE, and hylA genes could be involved in the adaptation of Enterococcaceae through the use of different carbohydrate sources. As hyaluronidases, i.e., endolytic glycoside hydrolases, HylA protein would be complementary to EfChi18A-EfCBM33A and EndoE to retrieve NAG from environment, described in a recent study by Keffeler et al. (2021b). These enzymes allow E. faecalis to utilize poly-β1,4-linked N-acetylglucosamine, found in chitin, as carbon source. NAG sugar intake is then mediated by NagE and the Mpt glucose/ mannose permease complex (MptBACD; Keffeler et al., 2021b).
HylA was identified as a MSCRAMM, thanks to its ligandbinding site including an Ig-like domain (Sillanpaa et al., 2004(Sillanpaa et al., , 2009) and is considered as a virulence factor with adhesion properties (Nallapareddy et al., 2005). A previous study showed that a MSCRAMM of E. faecalis named EfbA can play an important role in maintenance through biofilm formation, in addition to its role in fibronectin adhesion and aortic valve colonization, in rat model (Singh et al., 2015). In the case of HylA, which also plays a role in biofilm formation, the protein does not fit this typical model, as its Ig-folded region is shorter than others and was suggested to have lower binding properties (Sillanpaa et al., 2009).
Proteins homologous to HylA have been found mainly in the genomes of the genera Enterococcus and Enterococcus_B (GTDB taxonomy), and in two forms: a short form with only the three lyase domains and a long form with additional domains in the N-and C-terminal regions (Figures 3, 6; Supplementary Figure 2C). The additional domains are involved in host-cell interactions, binding to cell wall associated proteins, or found in virulence factors that are produced by Gram-positive pathogens. The long form is found in E. faecalis and E. hirae, and is always associated to a RAT sequence. Our results show that these sequences were acquired by HGTs and that the presence of a RAT sequence places them under the control of NagY. Moreover, E. faecalis and E. hirae are both involved in enterococcal infections in humans (Agudelo Higuita and Huycke, 2014). Thus, this regulatory change and the presence of these additional domains confers novel properties to the HylA enzyme domain that may have contributed to the successful colonization of the gut by E. faecalis.
HylA of E. faecalis shares similarities with two polysaccharide lyases from pathogens like Staphylococcus aureus (HysA) and Streptococcus pyogenes (HylA), but the conservation of these sequences is only found for the lyase enzymatic domains (29% identity, 64% cover, and 28% identity, 46% cover, respectively). For similar coverage, these proteins are closer to the short Lyase_8 sequences of other Enterococcaceae (34 and 41% identity with protein from E. cecorum ATCC 43198, for example). Moreover, in E. faecalis, HylA is anchored to the envelope, contrarily to its hyaluronidase homologs in S. aureus and S. pyogenes. Many surface proteins are thought to be anchored to the cell wall of Gram-positive bacteria via their C-terminus. All surface proteins harboring an LPXTG sequence motif may therefore be cleaved and anchored by a universal mechanism (Navarre and Schneewind, 1994;Siegel et al., 2017;Bhat et al., 2021). We unexpectedly showed that HylA favors biofilm formation on GAG coating thanks to these domains, whereas hyaluronidases like HysA in S. aureus (Ibberson et al., 2016) are shown to be effective in dispersing biofilm, by cleaving glycosidic linkages of hyaluronic acid of the extracellular matrix. The fact that the biofilm dispersion phenotype is identical for ΔnagY and ΔhylA mutants supports that these genes belong to the same regulon.
In this report, we also observed that HylA is involved in E. faecalis colonization faculties in the G. mellonella in vivo model. This agrees with previous studies on Gram-positive pathogens, where HylA and its homologs were shown to be virulence factors (Hynes and Walton, 2000;Makris et al., 2004;Tsigrelis et al., 2006). Since hyaluronate is a major constituent of ECM, hyaluronidases are essential components to increase the permeability of the host environment, to weaken connective tissues and to allow the spread of pathogens from their initial site of infection (Hynes and Walton, 2000). The phylogenetic study showed that HylA was found only in the E. faecalis but with a very variable degree of sequence size conservation (Figure 7). Other HylA proteins like those of the pathogens S. agalactiae or L. monocytogenes have a domain organization very close to those of E. faecalis sequences, and the high sequence conservation between these distant species suggests recent HGTs. These observations suggest that the presence of a hylA gene would not be essential or would be counter-selected for E. faecalis strains in relation to their adaptative interactions with their host. Evidence from other Gram-positive pathogens shows that the adhesin family of MSCRAMM may serve as potential candidates for the development of novel immunotherapies (Rivas et al., 2004), opening interesting prospect for HylA in the future. We have established that NagY is able to regulate its own expression and the one of the HylA hyaluronidase, which is involved in the degradation of hyaluronic acid, a component of the host ECM, in biofilm formation and in pathogenicity. An interesting study performed on uropathogenic E. coli shown similar involvement of the PafR antiterminator in metabolism during colonization, with potential targets contributing to virulence traits like biofilm formation, adhesion or motility, and specifically expressed in vivo (Baum et al., 2014). In our Gram-positive bacterial model, this is the first evidence of an antiterminator regulon with direct target genes not only localized in the close genomic environment of the regulator gene. Consequently, the knowledge of NagY regulon may open up interesting perspective to decipher colonization mechanism of E. faecalis pathobiont.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary material; further inquiries can be directed to the corresponding authors.

Author contributions
DS, MS, OL, AR, and CM designed the study and the research. YQ and GF conceived, designed, and performed the phylogenomic analyses. DS, MS, PL, NS, AB, OL, and CM performed the experiments. CM coordinates the project. DS, MS, PL, NS, AB, OL, YQ, GF, AR, and CM wrote the manuscript. All authors contributed to the article and approved the submitted version.